12. 8-bit Calculations
We’ve covered freezing the graph and optimizing for inference, but we haven’t yet covered quantization. So the next optimization we’ll discuss is converting the graph to perform 8-bit calculations. Here’s an example using the
transform_graph
tool:
~/tensorflow/bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
--in_graph=frozen_graph.pb \
--out_graph=eightbit_graph.pb \
--inputs=image_input \
--outputs=Softmax \
--transforms='
add_default_attributes
remove_nodes(op=Identity, op=CheckNumerics)
fold_constants(ignore_errors=true)
fold_batch_norms
fold_old_batch_norms
fuse_resize_and_conv
quantize_weights
quantize_nodes
strip_unused_nodes
sort_by_execution_order'
There’s a lot going on here, which you can find more information in the TensorFlow Graph Transforms documentation .
The gist is that
fold
transforms look for subgraphs that always evaluate to to the same result. Then they consolidate each such subgraph into one
Constant
node.
quantize_weights
quantizes values larger than 15 bits. It also adds nodes to convert back to floating point. The
quantize_weights
transform is mainly for reducing graph size. For the desired quantization computation behaviour we’ll need to use
quantize_nodes
as well.
Ok, let’s take a look:
from graph_utils import load_graph
sess, eightbit_ops = load_graph('eightbit_graph.pb')
print(len(eightbit_ops)) # 425
There are 425 operations, that’s more than the original frozen graph! Quantization computation requires extra nodes in general so it’s not a big deal. Nodes that have no quantization equivalent are keep as floating point.